feat(pipeline): allow syncing blocks ontop of the proposed chain#21025
feat(pipeline): allow syncing blocks ontop of the proposed chain#21025
Conversation
8de6157 to
31f941d
Compare
3a75fdb to
1ca98d8
Compare
| // Atomically set the pending checkpoint number alongside the block if provided | ||
| pendingCheckpointNumber !== undefined && | ||
| this.store.blockStore.setPendingCheckpointNumber(pendingCheckpointNumber), |
There was a problem hiding this comment.
This doesn't match the comment that this Sets the pending checkpoint number (quorum-attested but not yet L1-confirmed), right?
That said, let's discuss the model. Weirdly, I like the concept of having an uncheckpointed checkpoint. But it seems like we have two different things:
- A checkpoint-being-built, which is the checkpoint being built via proposed blocks. This is just a checkpoint number, since a
Checkpointobject needs all its blocks to be ready. Today we already have this, but don't need to explicitly store it. - A checkpoint-being-proposed, which is the
Checkpointobject for which the current proposer has sent a checkpoint proposal, and would ultimately make it onto L1.
We need to define which ones we expose to clients of the archiver, and also to users via APIs like getL2Tips.
6a16e98 to
28a0520
Compare
27f8a4b to
f5c6308
Compare
b1b1b14 to
ce0e422
Compare
f5c6308 to
ae6b651
Compare
ce0e422 to
d8b8d31
Compare
3a47fbf to
4bb6845
Compare
d8b8d31 to
6973d24
Compare
4bb6845 to
0e9b52e
Compare
0e9b52e to
dee426a
Compare
6973d24 to
21cdcd5
Compare
21cdcd5 to
9543a61
Compare
9bba2e5 to
4902133
Compare
2548aed to
4c620ca
Compare
| } | ||
|
|
||
| /** Sets the pipelining tree-in-progress boundary for building ahead of L1 confirmation. */ | ||
| public setPipeliningTreeInProgress(value: bigint): Promise<void> { |
There was a problem hiding this comment.
this was added to deal with #21494 - i did not want to directly set the other tree in progress, I think my current solution is nasty and would appreciate a discussion around it
There was a problem hiding this comment.
I'm not sure I understand the need for this, even after going through all its uses. The tree-in-progress is a property of the Inbox L1 contract, not of the L2 chain, it's goal being not trying to consume messages for a given checkpoint too early. If the check added in 21494 is throwing, it means we have a bigger problem and need to adjust the inbox lag.
| debugLogStore, | ||
| ); | ||
|
|
||
| // Register a callback for all nodes to set the pending checkpoint number when a checkpoint proposal is received. |
There was a problem hiding this comment.
Adds a lovely callback, the existing one was only relevant to nodes performing validation duties - allNodesCheckpointProposalHandler is triggered by everyone, not just validators
There was a problem hiding this comment.
Today the validator-client is running a bunch of validations after it receives the proposal from libp2p service (they are not done in libp2p service mostly because they are slow). This callback means a node will push into their archiver potentially invalid checkpoint proposals, which is bad.
And yeah, I know it's confusing, since the method in libp2pservice is called processValidCheckpointProposal, even though the proposal is still missing a bunch of validations.
We had had this same issue with mbps, since now regular nodes needed to start following block proposals. So we extracted block validation logic to a block proposal handler, which is installed always regardless of the node being a validator or not. Maybe we should do the same here?
| } | ||
|
|
||
| /** Merges multiple StateOverride arrays, combining stateDiff entries for the same address. */ | ||
| public static mergeStateOverrides(...overrides: StateOverride[]): StateOverride { |
There was a problem hiding this comment.
This duplicates packing the fee slot
Whenever checkpoint blocks become very full, the fee will drift from block to block, this prevents simulating into the future failing
| // Knowledege of pending checkpoints is in the PR above | ||
| const { targetSlot } = this.epochCache.getTargetAndNextSlot(); | ||
| if (targetSlot <= this.lastSlotProcessed) { | ||
| let slot; |
4c620ca to
c2cc23c
Compare
c2cc23c to
366f809
Compare
| await store.addCheckpoints([checkpoint1]); | ||
|
|
||
| // Set pending checkpoint to 3 (far ahead) | ||
| await store.blockStore.setPendingCheckpoint({ |
There was a problem hiding this comment.
You have a good point, will restrict
| await store.addCheckpoints([checkpoint1, checkpoint2, checkpoint3]); | ||
|
|
||
| // Set pending to 2 (already confirmed, but that's fine for the test) | ||
| await store.blockStore.setPendingCheckpoint({ |
There was a problem hiding this comment.
Should this be allowed either?
| lastArchive: block2.archive, | ||
| }); | ||
|
|
||
| // Block for checkpoint 2 should work (previous confirmed = 1) |
There was a problem hiding this comment.
Maybe I misunderstand the flow. But shoudl we allow adding proposed blocks for multiple different checkpoints?
Unless we are already potentially supporting build ahead of multiple checkpoints?
There was a problem hiding this comment.
It has been restricted to +1
| block: { number: provenBlockNumber, hash: provenBlockData.blockHash.toString() }, | ||
| checkpoint: provenCheckpointId, | ||
| }, | ||
| pendingCheckpoint: pendingCheckpointBlockData |
There was a problem hiding this comment.
Shall we just call this pending?
There was a problem hiding this comment.
We definitely need to rethink naming. The rollup contract already has the concept of "pending checkpoint" which is actually the checkpointed checkpoint. We need to change either.
Other thoughts, in random order, some mutually exclusive:
- Do we want to bundle the pending checkpoint with the
proposedchain tip? It'll break the property that, in a chain tip, the reported block matches the last block of the reported checkpoint. But maybe it's fine. - Do we want to differentiate attested vs non-attested pending checkpoints? Personally I don't think so.
- Should the
pendingCheckpointtip return thecheckpointedone if there's no pending checkpoint block data? It's the only chain tip that may be undefined. - Do we want to make a bigger rename? It seems like we're dealing with "pending checkpoints" and "checkpointed checkpoints". Maybe "checkpoint" was the bad name, and we should be talking about "pending bundles" and "checkpointed bundles" or something like that?
No need to make these naming changes in this PR, but let's give it a good thought before closing this epic.
| // The checkpoint proposal often arrives before the last block finishes re-execution. | ||
| // Trigger a sync to flush any queued blocks, then retry. | ||
| if (!blockData) { | ||
| await archiver.syncImmediate(); |
There was a problem hiding this comment.
Will probably need a retry loop here won't we?
| this.allNodesCheckpointReceivedCallback = ( | ||
| _checkpoint: CheckpointProposalCore, | ||
| ): Promise<CheckpointAttestation[] | undefined> => { | ||
| return Promise.resolve(undefined); |
There was a problem hiding this comment.
Should probably log an error here shouldn't we? Everyone should register a handler here.
| this.validatorCheckpointReceivedCallback = ( | ||
| checkpoint: CheckpointProposalCore, | ||
| ): Promise<CheckpointAttestation[] | undefined> => { | ||
| this.logger.debug( |
There was a problem hiding this comment.
And this log line could maybe go if this is validator only. Otherwise full nodes will print this routinely.
dcd68a6 to
2d31671
Compare
| const canProposeCheck = await publisher.canProposeAt(syncedTo.archive, proposer ?? EthAddress.ZERO, { | ||
| ...invalidateCheckpoint, | ||
| }); | ||
| // Determine the correct archive and L1 state overrides for the canProposeAt check. |
There was a problem hiding this comment.
This block has been edited since last review @PhilWindle
| } | ||
|
|
||
| // What's the slot of the first uncheckpointed block? | ||
| // Don't prune blocks that are covered by a pending checkpoint (awaiting L1 submission from pipelining) |
There was a problem hiding this comment.
Slightly confused by this. What happens if a checkpoint fails to land on L1? Surely the blocks covered by that (pending) checkpoint are removed? As well as all blocks built afterwards?
There was a problem hiding this comment.
ah, md/pipeline-recovery-2 im dealing with this currently in this branch
2d31671 to
2591475
Compare
| // Trigger syncs to flush any queued blocks, retrying until we find the data or give up. | ||
| if (!blockData) { | ||
| blockData = await retryUntil( | ||
| async () => { |
There was a problem hiding this comment.
I think we probably want to
- Not call
syncImmediatehere. That's going to force the archiver to query L1 aggressively. When the block is pushed to the archiver it already calls that function, se we aren't going to make things go any faster. - Use a more appropriate timeout. Presumably that wuld be the end of the slot?
| /** Storage format for a pending checkpoint (attested but not yet L1-confirmed). */ | ||
| type PendingCheckpointStore = { | ||
| header: Buffer; | ||
| checkpointNumber: number; | ||
| startBlock: number; | ||
| blockCount: number; | ||
| totalManaUsed: string; | ||
| feeAssetPriceModifier: string; | ||
| }; |
There was a problem hiding this comment.
Why can't we capture archive, outhash, or all data that's not L1 or attestations?
Also, nit: rename to PendingCheckpointStorage for consistency with the other types here.
There was a problem hiding this comment.
it can, I just kept the minimum; will add
| // The same check as above but for checkpoints. Accept the block if either the confirmed | ||
| // checkpoint or the pending (locally validated but not yet confirmed) checkpoint matches. | ||
| const expectedCheckpointNumber = blockCheckpointNumber - 1; | ||
| if ( | ||
| !opts.force && | ||
| previousCheckpointNumber !== expectedCheckpointNumber && | ||
| pendingCheckpointNumber !== expectedCheckpointNumber | ||
| ) { | ||
| const [reported, source]: [CheckpointNumber, 'confirmed' | 'pending'] = | ||
| pendingCheckpointNumber > previousCheckpointNumber | ||
| ? [pendingCheckpointNumber, 'pending'] | ||
| : [previousCheckpointNumber, 'confirmed']; | ||
| throw new CheckpointNumberNotSequentialError(blockCheckpointNumber, reported, source); |
There was a problem hiding this comment.
Is there any situation where addProposedBlock would add a block for the pending checkpoint? My understanding is we add proposed blocks, then throw a checkpoint proposal on top to flag those as "pending checkpointing", and then keep adding proposed blocks for the next one.
There was a problem hiding this comment.
Also, adding a block to a pending checkpoint breaks the blockCount property of the PendingCheckpointStore. Seems like we should not allow that.
There was a problem hiding this comment.
No, we should not end up adding directly to the pending checkpoint, only above it.
This case is to allow building ontop of the pending checkpoint - not for. But it looks like it may allow what you have mentioned, I'll make it more strict
| block: { number: provenBlockNumber, hash: provenBlockData.blockHash.toString() }, | ||
| checkpoint: provenCheckpointId, | ||
| }, | ||
| pendingCheckpoint: pendingCheckpointBlockData |
There was a problem hiding this comment.
We definitely need to rethink naming. The rollup contract already has the concept of "pending checkpoint" which is actually the checkpointed checkpoint. We need to change either.
Other thoughts, in random order, some mutually exclusive:
- Do we want to bundle the pending checkpoint with the
proposedchain tip? It'll break the property that, in a chain tip, the reported block matches the last block of the reported checkpoint. But maybe it's fine. - Do we want to differentiate attested vs non-attested pending checkpoints? Personally I don't think so.
- Should the
pendingCheckpointtip return thecheckpointedone if there's no pending checkpoint block data? It's the only chain tip that may be undefined. - Do we want to make a bigger rename? It seems like we're dealing with "pending checkpoints" and "checkpointed checkpoints". Maybe "checkpoint" was the bad name, and we should be talking about "pending bundles" and "checkpointed bundles" or something like that?
No need to make these naming changes in this PR, but let's give it a good thought before closing this epic.
| } | ||
|
|
||
| // What's the slot of the first uncheckpointed block? | ||
| // Don't prune blocks that are covered by a pending checkpoint (awaiting L1 submission from pipelining) |
| const current = await this.getPendingCheckpointNumber(); | ||
| if (pending.checkpointNumber <= current) { | ||
| this.#log.warn(`Ignoring stale pending checkpoint number ${pending.checkpointNumber} (current: ${current})`); | ||
| return; | ||
| } | ||
| const confirmed = await this.getLatestCheckpointNumber(); | ||
| if (pending.checkpointNumber !== confirmed + 1) { | ||
| this.#log.warn( | ||
| `Ignoring pending checkpoint ${pending.checkpointNumber}: expected ${confirmed + 1} (confirmed + 1)`, | ||
| ); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Unless we can think of legitimate situations for this, I'd throw instead of warning. It will help us catch inconsistencies easier.
| if (this.interrupted) { | ||
| return undefined; | ||
| } | ||
| return this.sendRequests(); |
There was a problem hiding this comment.
Should we re-simulate before sending, this time with the actual state of L1, instead of the manual overrides? This should help catch scenarios where the previous checkpoint didn't behave as we expected (not to mention the ones where it didn't land).
There was a problem hiding this comment.
#21250 in here i add a preCheck hook to each submission that runs after the enqueue sleep - without the overrides
| const grandparentCheckpointNumber = CheckpointNumber(this.checkpointNumber - 2); | ||
| const [grandparentCheckpoint, manaTarget] = await Promise.all([ | ||
| rollup.getCheckpoint(grandparentCheckpointNumber), | ||
| rollup.getManaTarget(), | ||
| ]); |
There was a problem hiding this comment.
I guess this works because we're guaranteed to have a grandparent checkpoint if there's a non-undefined pendingCheckpointData? Otherwise I see this failing in the first checkpoint(s)?
| /** The last checkpoint proposal job, tracked so we can await its pending L1 submission during shutdown. */ | ||
| private lastCheckpointProposalJob: CheckpointProposalJob | undefined; |
There was a problem hiding this comment.
Stupid question: why do we want to await it?
There was a problem hiding this comment.
i was just thinking it has already been validated, and youll get rewards may aswell wait to get em. dont know if theres a clear need, just thought it was nice to have.
| if (invalidateCheckpoint) { | ||
| // After invalidation, L1 will roll back to checkpoint N-1. The archive at N-1 already | ||
| // exists on L1, so we just pass the matching archive (the lastArchive of the invalid checkpoint). | ||
| archiveForCheck = invalidateCheckpoint.lastArchive; | ||
| l1Overrides.forcePendingCheckpointNumber = invalidateCheckpoint.forcePendingCheckpointNumber; | ||
| this.metrics.recordPipelineDepth(0); | ||
| } else if (this.epochCache.isProposerPipeliningEnabled() && syncedTo.hasPendingCheckpoint) { | ||
| // Parent checkpoint hasn't landed on L1 yet. Override both the pending checkpoint number | ||
| // and the archive at that checkpoint so L1 simulation sees the correct chain tip. | ||
| const parentCheckpointNumber = CheckpointNumber(checkpointNumber - 1); | ||
| l1Overrides.forcePendingCheckpointNumber = parentCheckpointNumber; | ||
| l1Overrides.forceArchive = { checkpointNumber: parentCheckpointNumber, archive: syncedTo.archive }; | ||
| this.metrics.recordPipelineDepth(1); | ||
|
|
||
| this.log.verbose( | ||
| `Building on top of pending checkpoint (pending=${syncedTo.pendingCheckpointData?.checkpointNumber})`, | ||
| ); |
There was a problem hiding this comment.
Shouldn't the order of this ifs be switched? If there's a pending checkpoint AND the tip of the checkpointed chain is invalid, we should expect the pending checkpoint to perform the invalidation when it lands, so we should build on top of the pending checkpoint instead.
| * Note: this checks against the checkpointed chain (L1-confirmed state), not the proposed chain. | ||
| */ | ||
| protected async checkSync(args: { ts: bigint; slot: SlotNumber }): Promise<SequencerSyncCheckResult | undefined> { | ||
| // Check that the archiver has fully synced the L2 slot before the one we want to propose in. |
There was a problem hiding this comment.
Need to update this comment

Overview
Key contributions:
Adds a second p2p callback that separates what runs for all nodes / validator nodes
Testing
epochs_mbps.pipeline now expects 3 blocks per checkpoint, just like the original epochs_mbps test, now it is fully pipelined.
Upcoming